On “Article Omission” in German and the “Uniform Information Density Hypothesis”

نویسندگان

  • Eva Horch
  • Ingo Reich
چکیده

This paper investigates whether Information Theory (IT) in the tradition of Shannon (1948) and in particular the “Uniform Information Density Hypothesis” (UID, see Jäger 2010) might contribute to our understanding of a phenomenon called “article omission” (AO) in the literature. To this effect, we trained language models on a corpus of 17 different text types (from prototypically written text types like legal texts to prototypically spoken text types like dialogue) with about 2.000 sentences each and compared the density profiles of minimal pairs. Our results suggest, firstly, that an overtly realized article significantly reduces the surprisal on the following head noun (as was to be expected). It also shows, however, that omitting the article results in a non-uniform distribution (thus contradicting the UID). Since empirically AO seems not to depend on specific lexical items, we also trained our language models on a more abstract level (part of speech). With respect to this level of analysis we were able to show that, again, an overtly realized article significantly reduces the surprisal on the following head noun, but at the same time AO results in a more uniform distribution of information. In the case of AO the UID thus seems to operate on the level of POS rather than on the lexical level.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information density of encodings: The role of syntactic variation in comprehension

The Uniform Information Density (UID) hypothesis links production strategies with comprehension processes, predicting that speakers will utilize flexibility in encoding in order to increase uniformity in the rate of information transmission, as measured by surprisal (Jaeger, 2010). Evidence in support of UID comes primarily from studies focusing on word-level effects, e.g. demonstrating that su...

متن کامل

Uniform Information Density at the Level of Discourse Relations: Negation Markers and Discourse Connective Omission

About half of the discourse relations annotated in Penn Discourse Treebank (Prasad et al., 2008) are not explicitly marked using a discourse connective. But we do not have extensive theories of when or why a discourse relation is marked explicitly or when the connective is omitted. Asr and Demberg (2012a) have suggested an information-theoretic perspective according to which discourse connectiv...

متن کامل

Uniform Surprisal at the Level of Discourse Relations: Negation Markers and Discourse Connective Omission

About half of the discourse relations annotated in Penn Discourse Treebank (Prasad et al., 2008) are not explicitly marked using a discourse connective. But we do not have extensive theories of when or why a discourse relation is marked explicitly or when the connective is omitted. Asr and Demberg (2012a) have suggested an information-theoretic perspective according to which discourse connectiv...

متن کامل

Acquisition and Accurate Use of English Articles by Persian Speakers

This study was conducted with the purpose of examining Persian speakers’ article acquisition and use with reference to Ionin, Ko and Wexler’s (2004) model, which is based on the prediction of Fluctuation Hypothesis (FH) that EFL learners of [-article] languages, like Persian, make erroneous article use in [+definite, -specific] and [-definite, +specific] contexts. From among the students of an ...

متن کامل

A New Method for Sperm Detection in Infertility Cure: Hypothesis Testing Based on Fuzzy Entropy Decision

In this paper, a new method is introduced for sperm detection in microscopic images for infertility treatment. In this method, firstly a hypothesis testing function is defined to separate sperms from plasma, non-sperm semen particles and noise. Then, some primary candidates are selected for sperms by watershed-based segmentation algorithm. Finally, candidates are either confirmed or rejected us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016